AD-A164  2*4  A  MODIFIED  KOLMOGOROV-SHIRNOV  ANDERSON-DARLING  AND 
CRANER-VON  RISES  TEST  F.  .  <U>  AIR  FORCE  INST  OF  TECH 
HRIOHT-PATTERSON  AFB  OH  SCHOOL  OF  EN6I. .  F  OCASIO 
UNCLASSIFIED  DEC  85  AFIT/GS0/HA/85D-5  F/Q  12/1 


MICROCOPY  RESOLUTION  TEST  CHART 


AD-A164  204 


AFIT/GS0/MA/85D-5 


A  MODIFIED  KOLMOGOROV-SfllRNDV, 
ANDERSON-DARLING,  AND  CRAMER-VON  MISES  TEST 
FOR  THE  CAUCHV  DISTRIBUTION 
WITH  UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 

THESIS 

FRANK  OCASIO 
CAPTAIN.  USAF 

AFIT /GS0/MA/85D-5 


Approved  for  public  release;  distribution  unlimited 


4 


AF  IT /6S0/M  A/85D-5 


A  MODIFIED  KOLMOGORO V-SM I RNO V,  ANDERSON-DARLING,  AND 
CRAMER-VON  MISES  TEST  FOR  THE  CAUCHY  DISTRIBUTION 
WITH  UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 


; •  V-V-1 

V' 


THESIS 


i  .  j 


Presented  to  the  Faculty  of  the  School  of  Engineering 
of  the  Air  Force  Institute  of  Technology 
Air  University 
In  Partial  Fulfillment  of  the 
Requirements  for  the  Degree  of 
Master  of  Science  in  Space  Operations 


Frank  Ocasio,  B.S.,  M.S. 
Captain,  USAF 

December  1985 


Accesion  For 

NTIS  CRA&I 
OTIC  TAB 
Unannounced 
Justification 


Availability  Codes 

Avail  and/or 
Special 


L  .  J 


Wi. 

'  -  *> 


.•v> 


!UJ 


m 


Approved  for  public  release;  distribution  unlimited  ( 


s*  • 


mm 


Pratsce 

The  purpose  of  this  study  wes  to  produce  e  set  of  critical  value  tables 
for  the  Cauchy  distribution  using  three  popular  goodness-of-fit  tests,  the 
Kolmogorov-Smlmov,  the  Anderson-Darling,  and  the  Cramer-von  Mlses.  This 
will  allow  anyone  doing  hypothesis  testing  to  test  a  null  hypothesis  involving 
the  Cauchy.  To  determine  the  confidence  the  user  may  have  when  using  these 
tables,  a  power  comparison  was  run  against  several  alternate  distributions. 

When  preparing  this  thesis,  I  received  a  great  deal  of  help  end  support 
from  others.  My  faculty  advisor.  Dr.  A.H.  Moore,  helped  keep  me  within  the 
original  scope  of  the  thesis  effort,  which  mode  it  possible  to  finish  on  time. 
Copt.  Jim  Porter  was  very  helpful  in  finalizing  my  computer  programs,  and 
without  his  help  I  would  still  be  working  on  those  programs.  My  two  little 
boys,  Mike  and  Mott,  helped  me  by  maintaining  my  overall  perspective,  and 
providing  me  enough  breaks  to  maintain  my  sanity.  Finally,  my  wife  Kellie 
deserves  more  thanks  than  she  will  probebly  ever  get  os  she  supported  me 
through  my  numerous  long  nights  during  the  thesis  preparation. 

Frank  Ocasio 


it 


Table  of  Contents 


Preface 

Abstract 


I.  Introduction . 

Chapter  Overview ..... 

Background  _ 

Problem  Statement .. 
Research  Question .... 
Research  Objectives 

II.  Goodness  of  Fit  Tests _ 


Chapter  Overview ... 
Hypothesis  Testing 

GOF  Tests . 

EOF  Statistics . 


III.  The  Cauchy  Distribution 


Chapter  Overview _ 

Definition _ 

Properties  of  the  Cauchy _ 

Uses  for  the  Cauchy _ 

Estimation _ 


••••  ««»«»»  HHHMHtM  M»»»M 


IV.  Methodology - 

Chapter  Overview _ 

Generating  the  Critical  Value  Tables _ 

The  Power  Study _ 

V.  Use  of  the  Tobies _ 


Chapter  Overview 
Use  of  the  Tables 


30 

30 


AF  IT /GSO/M  A/85D-5 


Abstract 

Tbs  Kolmogorov-Smimov,  Anderson-Darling,  and  Cramer-von  Mises 
critical  values  are  generated  for  the  Cauchy  distribution.  The  critical  values 
are  used  for  testing  the  null  hypothesis  that  a  set  of  observations  follow  a 
Cauchy  distribution  when  the  location  and  scale  parameters  are  unknown  and 
estimated  from  the  sample.  A  Monte  Carlo  simulation,  using  5000  repititions, 
was  used  to  generate  the  critical  values  for  sample  sizes  of  5(5)30  and  50. 

A  power  study  was  performed  using  Monte  Carlo  simulation  for  the 
Kolmogorov-Smimov,  Anderson-Darling,  and  Cramer-von  Mises  tests.  Sample 
sizes  of  5, 15, 25,  and  50  were  used  for  six  alternate  distributions,  for  alpha 
levels  of  .05  and  .01.  Analyzing  by  sample  size  shows  very  poor  power  for  a 
sample  size  of  five.  As  the  sample  size  increases  so  does  the  power,  so  that 
at  a  sample  size  of  fifty,  the  powers  ogalnst  three  of  the  six  distributions  is 
.5  or  better.  Among  the  three  tests,  the  Kolmogorov-Smimov  Is  consistently 
more  powerful,  regardless  of  sample  size  or  alpha  level. 


A  MODIFIED  KOLMOGOROV-SM I RNO V, 


ANDERSON-DARLING,  AND  CRAMER-VON  MISES  TEST 
FOR  THE  CAUCHY  DISTRIBUTION 
WITH  UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 


I.  Introduction 


Chanter  Overview 

This  chapter  gives  an  outline  of  the  scope  of  this  thesis.  Some 
background  will  be  covered  on  data  analysis  and  modeling,  tying  that  into 
goodness-of-fit  testing.  Then  the  Problem  Statement,  the  Research  Question, 
and  the  Research  Objectives  will  be  given. 

Background 

When  data  are  being  analyzed,  one  of  the  first  things  to  do  is  develop  a 
valid  model  of  that  dato.  This  is  a  four  step  process  (5:332),  with  the  first 
being  dota  collection.  The  next  step  is  to  analyze  the  empirical  data 
distribution  and  attempt  to  match  it  against  a  known  distribution.  This  is 
done  using  a  histogram,  which  gives  a  visual  image  of  the  data  distribution. 
Third,  the  parameters  of  that  known  distribution,  most  often  location  and 


scale,  are  estimated  from  the  data.  A  familiar  example  of  these  parameters 
is  the  mean  and  variance  of  the  Normal  distribution.  The  fourth  step  is  to 

apply  goodness-  of-fit  tests.  Here  a  null  hypothesis  oy  is  proposed  which 
states  that  the  actual  distribution  of  the  data  is  the  known  distribution, 
whereas  the  alternate  Hypothesis  (H,)  is  that  the  actual  distribution  is  not 

the  known  distribution.  The  tests  measure  the  fit  between  the  empirical  and 
known  distributions.  To  use  the  tests,  statistics  are  calculated  from  the  data 
and  compared  to  critical  value  tables  which  have  been  developed  for  various 

distributions.  The  comparison  will  result  in  accepting  or  rejecting  Hq.  If  1^ 

is  rejected,  the  process  is  repeated,  starting  with  the  second  step. 

The  three  goodness-of-fit  tests  used  for  this  thesis  apply  different 
techniques  to  determine  fit.  The  Kolmogorov-Smimov  (KS)  test  uses  the 
absolute  difference  between  the  empirical  and  known  distributions.  A 
problem  with  the  KS  test  is  that  it  tends  to  have  smaller  discrepencies  at  the 
tails  rather  than  near  the  median  of  the  distribution  (39:6).  One  woy  to 
overcome  this  problem  is  to  use  the  squared  differences  between  the 
distributions.  The  Anderson-Darling  (AD)  test  uses  a  weighted  squared 
difference  and  the  Cramer-von  Mises  (CVM)  uses  only  the  squared  difference. 

This  thesis  will  look  at  the  Cauchy  distribution.  It  is  similar  in  shape  to 
the  Normal  except  that  it  has  longer  and  flatter  tails  (21:154).  In  physics. 


the  Couchy  is  used  in  modeling  Brownian  motion  (32: 16 1). 

Rrabtem  Segment 

Highly  accurate  goodness-of-fit  tests  have  not  been  developed  for  the  Cauchy 
distribution  with  unknown  location  and  scale  parameters.  These  tests  would 
require  critical  value  tables  based  on  the  data  sample  size  and  parameters. 

Research  Question 

How  con  the  KS,  AD,  and  CVM  tests  be  modified  for  the  Cauchy  distribution 
when  the  location  and  scale  parameters  are  unknown? 

Research  Objectives 

1.  Generate  and  document  critical  value  tables  for  the  modified 
Kolmogorov-Smimov,  Anderson-Darling,  and  Cramer-von  Mises 

goodness-of-fit  tests. 

2.  Do  a  power  study  of  the  Kolmogorov-Smimov,  Anderson-Darlfng, 
and  Cramer-von  Mises  tests  to  determine  the  most  powerful  The 

power  is  the  probability  of  rejecting  Hq  when  H,  is  true  (6:79).  The 
higher  the  power,  the  greater  the  confidence  in  the  test  results. 


II.  EoodMSS  flj  Ell  Tests 


Chapter  Overview 

This  chapter  will  develop  the  background  for  goodness-of-fit  (GOF)  tests. 
First,  hypothesis  testing  will  be  covered  as  an  introduction  to  GOF.  This  will 

be  followed  by  a  look  at  GOF  tests.  The  X2  will  be  covered  as  the  most 
common  of  these  tests.  Then  the  concept  of  the  empirical  distribution 
function  (EDF),  and  Its  use  in  GOF,  will  be  discussed.  Finally,  the  EDF  tests 
which  will  be  used  in  this  thesis,  the  Kolmogorov-Smimov  (KS),  the 
Anderson-Darling  (AD)  and  the  Cramer-von  Mises  (CVM),  will  be  introduced. 

Hypothesis  Testing 

In  hypothesis  testing,  a  specific  statement  (called  the  hypothesis)  is 
mode  about  a  population.  Then  a  sample  is  taken  from  that  population.  Bosed 
on  that  sample,  a  decision  Is  mode  for  or  against  accepting  the  hypothesis 
(7:75).  That  decision  is  based  on  the  following  test  procedure  (7:75-77): 

1.  A  hypothesis  to  be  tested,  the  null  hypotheses  (Hq),  is  made  about 

the  population.  The  negative  of  Hq  is  also  set  up  and  labelled  Hr 

2.  To  make  the  decision,  a  test  statistic  is  used.  This  statistic 


would  assign  reel  numbers  to  points  in  the  sample  space  and  allow 
ordering  of  those  points  based  on  their  ability  to  tell  the 

difference  between  a  true  and  a  false 

3.  A  rule  is  established  to  determine  which  values  of  the  statistic 
will  allow  acceptance  and  which  rejection.  For  this  thesis,  larger 

values  of  the  statistic  tend  toward  rejection  of  fy.  That  value  of 

the  test  statistic  which  is  the  cutoff  between  accepting  and 
rejecting  is  called  the  critical  value,  and  if  the  test  statistic  is 

greater  than  that  value,  Hg  is  rejected. 

4.  A  random  sample  is  taken  from  the  population.  Based  on  that 
sample  the  test  statistic  is  evaluated,  and  the  hypothesis  is  then 
either  accepted  or  rejected. 

The  sample  that  is  taken  is  only  part  of  the  population,  and  therefore 
contains  only  part  of  the  total  information  available.  This  leads  to  a 

possibility  of  error  when  deciding  whether  to  accept  or  reject  1^.  This  error 
can  surface  in  two  ways:  Hg  can  be  rejected  when  it  is  actually  true,  which  is 

called  a  Type  I  error;  Hg  can  be  accepted  when  it  is  actually  false,  which  is 

called  a  Type  II  error  (7:78).  Since  hypothesis  testing  is  concerned  with 
minimizing  these  errors,  the  maximum  probabilities  of  making  these  errors 


hove  been  given  the  labels  of  a  for  Type  I  and  p  forType  II.  Related  to  p  Is 
the  parameter  of  power,  or  1-p,  the  probability  of  rejecting  when  false. 

The  basic  thrust  behind  hypothesis  testing  is  to  reject  fy,  while  with 
GOF  testing  the  reverse  is  true  (1:72). 

9 

GOF  Tests 

In  GOF  tests  is  that  a  selected  distribution  fits  the  distribution 
underlying  the  population  sample.  One  common  way  to  get  that  selected 
distribution  is  to  plot  the  sample  date  points  using  a  histogram  and  pick  a 
distribution  thot  visually  matches  that  histogram. 

GOF  tests  try  to  determine  if  there  is  any  evidence  of  disagreement 
between  the  sample  end  the  selected  distribution  (1:72).  The  assumption  is 
thot  the  sample  date  fits  the  distribution  unless  there  Is  enough  evidence  to 
disprove  that  assumption.  An  intuitive  approach  to  collecting  the  evidence  Is 
to  first  plot  the  sample  distribution  function: 

Fn(X):  r/n  (1) 

where  renumber  of  x*  i  x .  Then  compere  Fn(X)  with  the  assumed  distribution, 
end  visually  Inspect  for  substantial  disagreement  (1 1:290).  However,  to 


attain  accurate,  reproduceable  results  some  standard  is  required  to  measure 


the  discrepancy.  This  is  where  the  60F  tests  come  in. 

The  best-known  GOF  test  is  the  Chi-Square  (1:73).  The  test  first  groups 

the  sample  data  into  classes  then  compares  the  observed  frequency  of  Fn(X)  in 
each  of  the  classes  with  the  expected  frequency  of  the  assumed  distribution 
(39:2).  The  test  statistic  is  (1:73): 


(2) 


where 

f^  s  the  observed  frequency  per  class 

f  =  the  expected  frequency  per  class 
k  =  the  number  of  classes 

Some  of  the  advantages  with  this  test  are  It  Is  good  for  a  discrete 

distribution  and  the  statistic  can  be  adjusted  if  the  parameters  of  Fn(X)  are 
estimated  from  the  sample  (33:731).  A  disadvantage  is  that  the  sample  sizes 
must  be  fairly  large  (n  >  25)  for  the  test  to  work.  This  minimum  n  Is  to  ollow 
sufficient  data  points  In  each  class  to  calculate  the  test  statistic  (1:73). 
Another  set  of  GOF  tests  use  statistics  based  on  the  sample,  or 


empirical,  distribution  function,  otherwise  known  os  EDF  statistics  (33:732). 

With  these  tests,  a  comparison  is  made  between  Fn(X)  (the  EDF),  and  F(X),  the 
assumed  cumulative  distribution  function  (CDF),  to  see  if  they  match  (35:1). 

Fn(X)  is  defined  above,  where  the  n  values  of  x4  are  a  random  sample  from  X. 
From  the  Xj,  if  x^, ...,  x(n)  are  set  up  as  ascending  ordered  statistics,  then 
FP(X)  is  defined  by  (35:1) 


F„(X)  =  0 

*<X0) 

(3) 

Fn(X)  =  1/n 

Xj  s  x  <  X(H1),  1  =  1,...,  (n-1) 

(4) 

Fn(X)  =  1 

< *• 

(5) 

The  expectation  is  that  Fn(X),  the  proportion  of  the  random  sample  <  x,  would 
give  a  good  estimate  of  F(X),  the  probability  of  X  <  x,  which  it  does  (35:1). 

This  leads  to  the  development  of  the  EDF  statistics  which  use  the  discrepancy 

between  Fn(X)  and  F(X)  to  determine  If  the  sample  comes  from  F(X). 

Some  advantages  with  using  EDF  statistics  are  that,  unlike  the 
Chi-Square,  they  can  be  used  with  small  sample  sizes,  and,  when  F(X)  is  fully 
specified,  they  are  more  powerful  than  the  Chi-Square  test  (33:732).  One 
disadvantage  is  that  EDF  statistics  cannot  be  used  for  discrete  distributions. 
Another  disagventage  was  that,  initially,  EDF  statistics  coulo  only  be 


used  when  F(X)  was  fully  specified.  This  was  due  to  the  use  of  the  probability 
Integral  transformation,  which,  when  used  with  a  fully  specified  CDF,  will 
convert  the  values  of  that  CDF  to  ordered  values  from  zero  to  one  based  on  a 
uniform  distribution  (39:5).  If  the  parameters  of  F(X)  were  estimated,  the 
cumulative  distribution  of  the  EDF  statistics  would  depend  not  only  on  the 
somple  size,  but  also  on  the  value  of  the  unknown  parameters  (33:731).  This 
limitation  to  a  fully  specified  F(X)  prevented  the  widespread  use  of  EDF 
statistics,  since  the  parameters  of  the  assumed  distribution  ore  usuolly  not 
known  beforehand  and  must  be  estimated  from  the  somple  data. 

In  1940,  David  and  Johnson  (10)  changed  that  when  they  showed  thot  If 
Invariant  estimates  of  only  the  location  ond  scale  parameters  ore  token  from 
the  somple  data,  then  the  cumulative  distribution  of  the  EDF  statistics  will 
depend  on  the  functional  form  of  F(X),  not  on  the  estimated  parameters.  This 
cleored  the  way  for  modified  (using  estimated  parameters)  tables  of  criticol 
values  to  be  generated  for  a  variety  of  distributions  which  would  depend  only 
on  sample  size  ond  significance  level  (a).  The  first  was  H.  W.  Lilliefors  for 
the  normal  (25)  ond  exponential  distribution  (26).  J.  6.  Bush  did  tables  for  the 
Welbull  distribution  (6),  ond  P.  J.  Viviano  did  so  for  the  Gamma  distribution 
(36).  Green  and  Hegozy  did  tobies  for  the  Uniform,  Normal,  Laplace, 
Exponential  end  Cauchy  distributions  (16).  This  thesis  will  do  a  new  set  of 
critical  value  tables  for  the  Cauchy  because  Green  and  Hegozy  did  not  use  the 


some  estimating  technique  for  parameter  estimation,  and  they  did  not  use  the 
bootstrap  interpolation  technique,  which  will  be  discussed  in  Chapter  IV. 


As  with  all  the  60F  tests,  the  intent  in  hypothesis  testing  when  using 

EOF  statistics  is  to  accept  Hq.  This  can  make  power  problems  a  significant 
concern.  The  desire  would  be  for  the  results  of  the  testing  to  be  powerful. 


k  »  j 

\*v 


i.e.,  to  accept  Hq  and  also  feel  confident  that  the  alternate  hypothesis  is  false. 
However,  this  is  not  always  the  case.  One  problem  is  that  though  EDF 
statistics  can  be  used  with  small  sample  sizes,  the  results  are  not  very 
powerful  (29:3).  For  example  when  Green  and  Hegazy  did  power  studies  on 
their  statistics,  for  n  =  5,  the  power  was  never  greater  than  0.5  (16).  Another 
problem  Is  that  the  statistics  are  more  powerful  against  some  distributions 
(*21:3).  This  makes  the  results  of  power  studies  helpful  to  anyone  using 

these  statistics,  since,  assuming  Hq  is  accepted,  the  power  study  can  be 
referenced  to  determine  how  much  confidence  can  be  had  in  the  results. 


EBf  SWffliw 

This  thesis  will  work  with  three  EDF  statistics,  the  Kolmogorov-Smimov 
statistic  (KS),  the  Cramer-von  Mises  statistic  (CVM),  and  the  Anderson- 
Darling  statistic  (AD). 

The  KS  statistic  is  defined  as  (*21:15): 


10 


•M 

‘>2 


rAr 


D  =  max  |F*(x)  -  F(X)I 


(6) 


where 

the  Xj  are  ordered 

F*(x)  Is  the  CDF  value  of  the  data  point 


It  is  based  on  the  greatest  vertical  difference  between  the  two  functions 
(35:2),  and  has  these  advantages  over  the  Chi-Square  (28:76): 

-  It  does  not  lose  information  by  grouping  whereas  the  Chi-Square 
does,  and  this  information  loss  is  large  for  small  samples,  making 
the  KS  statistic  a  better  choice  for  small  samples. 

-  The  KS  statistic  is  easier  to  determine  computationally. 

One  problem  with  the  KS  is  its  insensitivity  to  differences  In  the  tails,  since 
both  functions  tend  to  0  and  1  in  those  extremes  (12:3). 

A  more  flexible  set  of  statistics  is  the  Cramer-von  Mises  family,  to 
which  both  of  the  other  statistics  belong.  This  family  incorporates  a  weight 
function,  ¥(X),  which  allows  weighting  the  deviations  bosed  on  the 
Importance  of  different  portions  of  the  distribution  function  (2:194).  These 
statistics  are  based  on  the  Integral  of  the  weighted  squared  difference 
between  the  assumed  distribution  and  the  EDF  (35:2): 

W*  =  J*  (Fn(X)-F(X)P*(X)dx  (7) 

It 


*  «  "  m 


■>?-!  ■V-1' 


The  CVM  statistic  is  W*  with  'F(X)  =  1  (35:2).  The  computational  form  of 


this  statistic  is  (16:205): 

W2  =  ( 1 2n)_1  ♦  glYj  -  (2i  -  1  )(2n)“ 1 P  (8) 

where  Yj  =  F(x,)  . 

The  AD  statistic  sets  the  weight  function  equal  to  the  inverse  of  the 
variance  of  F(X)  (35:2): 

F(X)1  =  [{  F(X)}{1-  F(X))r1  (9) 


This  assigns  equal  weights  to  each  point  of  F(X)  (2:195),  Increasing  the 
weight  given  to  the  tails  of  the  distribution,  and  providing  better  detection  of 
differences  in  the  tails  than  the  KS  or  the  CVM  statistics  (34:360). 

The  computational  form  of  this  statistic  is  (16:206): 

A  =  -  n  -OtrM  jt  (21  -  1)l1n  Yj  +  ln(1-  Y#+ , 


(10) 


Chopter  Overview 

This  chapter  will  discuss  various  aspects  of  the  Cauchy  distribution. 

The  first  ospect  is  the  definition  of  the  distribution,  covering  the  pertinent 
equations.  Then  the  properties  of  the  Cauchy  are  covered,  followed  by  a  brief 
glance  at  some  of  its  uses.  Finally,  parameter  estimation  for  the  Cauchy  will 
be  discussed,  from  a  general  look  at  estimation  to  a  discussion  of  the 
estimation  technique  to  be  used  In  this  thesis. 

Definition 

The  Cauchy  probability  density  function  (PDF)  is  (21:154): 

(irwMMfo-eJAFr1  (ID 

where 

X  >0 

X  Is  the  scale  parameter 
8  is  the  location  parameter 
The  CDF  for  the  Cauchy  is  (17:404): 


'/*  ♦  Tf'torf'Kx  -  0)/x] 


(12) 


The  characteristic  function  Is  (22:1 1): 

Cxd)  =  exp(1t0  -  ItlX)  (13) 

The  Xth  partial  derivative  of  Cx(t)/ik  with  respect  to  t,  when  evaluated  at  t  = 
0,  Is  the  K1h  moment  (22: 1 1 ). 

Properties  of  the  Cauchy 

Given  Cx(t),  an  evaluation  of  the  first  partial  derivative  with  respect  to  t 
at  t  =  0  yields  an  Imaginary  solution,  resulting  in  all  higher  partials  being 
imaginary  (22:1 1).  This  leads  to  an  oddity  of  the  Cauchy,  namely,  that  it  has 
no  moments  of  order  >  1,  and  therefore  has  an  infinite  expected  value  and 
standard  deviation  (21:1 54). 

Though  it  has  no  finite  expected  value,  the  Cauchy  is  symmetric  about  Its 
expected  value,  and  is  a  member  of  the  symmetric  stable  family  (24:133). 

This  symmetry  is  similar  to  the  Normal,  except  for  the  longer  and  flatter 
tails  of  the  Cauchy  (21:154). 

There  are  some  other  properties  to  note  (30:303-305): 


1.  The  distribution  of  the  reciprocal  of  a  Cauchy  variable  is  the  same 
as  that  of  the  variable. 

2.  The  arithmetic  means  of  samples  from  the  Cauchy  have  the  same 
distribution  as  the  Cauchy. 

3.  The  distribution  of  the  product  and  quotient  of  two  Cauchy 
variables  is: 

f(u)  =  [flW  -  OF1  log  (u2)  (14) 

Uses  for  the  Cauchy 

As  a  member  of  the  symmetrical  stable  family  of  distributions,  the 
Cauchy  has  applications  in  economic  modeling  end  estimation  (15:275). 
Time-series  and  cross-section  data  for  such  things  as  personal  incomes, 
stock  and  commodity  price  changes,  and  employment  measures  of  businesses 
often  were  assumed  to  behave  as  normally  distributed  random  variables. 
However,  frequency  functions  consistently  come  up  with  too  much  moss  in  the 
toils  to  be  accounted  for  by  the  normal.  The  Cauchy,  with  its  longer  and 
flatter  tails,  allows  for  that  mass.  This  backs  up  the  statement  made  by 
Haas  and  Bain  that  “the  Cauchy  distribution  should  be  considered  os  a  possible 
model  whenever  one  needs  a  density  function  with  heavier  tails  than  the 
normal  distribution  allows'  (17:403). 

Fig.  1  is  a  geometrical  application  of  the  Couchy  distribution  (21:161). 


In  this  model  the  Couchy  distribution  represents  the  distribution  of  P,  the 
point  of  Intersection  of  a  variable  straight  line  with  a  fixed  straight  line. 

The  variable  straight  line  is  randomly  oriented  in  two  dimensions  through  the 
fixed  point  A.  The  result  is  the  distance  OP  is  Couchy  distributed  with  8  =  0. 


Fig  1.  Cauchy  Distribution  Model  (21.100 


Using  this  model,  the  Cauchy  distribution  can  represent  the  distribution 
of  points  where  particles  from  a  point  source,  shown  as  A,  impact  a  fixed 
straight  line  (21:161).  This  is  used  in  physics,  where  the  Cauchy  distribution 
Is  used  to  help  describe  the  motion  of  a  random  point  in  standard  Brownian 
motion  (32). 

Estimation 

Estimation  is  a  procedure  that  allows  generalizing  from  a  sample  to  a 
population  (37:334).  In  this  thesis  the  concern  is  with  point  estimation. 


where  a  sample  statistic  is  used  to  estimate  a  population  parameter.  There 
are  several  desirable  properties  for  point  estimators  (37:335-342): 

1.  Unbiasedness:  where  the  expected  value  of  the  estimator  (6)  is 
equal  to  the  parameter  ( 0 ),  i.e.,  E(G)  =  0  . 

2.  Consistency:  where  the  larger  the  sample,  the  higher  the 
probability  of  G  being  close  to  0. 

3.  Relative  Efficiency:  that  the  estimator  be  more  efficient  (smaller 
a)  than  other  estimators. 

4.  Sufficiency:  that  the  estimator  contain  all  the  information 
available  in  the  data  about  the  parameter. 

One  method  of  estimation  uses  the  sample  as  the  guide  to  the  parameter 

(37:345).  With  sample  values  (x,,  v?  •  xnX  8  likelihood  function  is  set  up: 

L<x,,...,xn|e)  (15) 

This  is  the  likelihood  of  getting  this  particular  sample,  given  some  6.  The 
maximum  likelihood  principle  says  to  take  as  an  estimate  of  0  that  value 
which,  while  within  the  range  of  6,  makes  the  likelihood  function  os  large  os 
possible  (23:35).  For  computational  purposes,  it  is  usually  easier  to  work 
with  log  L. 

An  attractive  feature  of  the  maximum  likelihood  estimator  (MLE)  is  that 


it  is  invariant  (37:346).  Invariance,  in  terms  of  the  variables  used  above, 
means  that  if  6  is  the  MLE  of  8  and  h(8)  has  an  inverse,  then  h(G)  is  an  FILE  of 
h(0).  An  example  is  with  a  sample  taken  from  a  normally  distributed 

population.  For  this  case  S2,  the  sample  variance,  is  an  MLE  of  a2,  the 
population  variance.  Invariance  says  that  the  sample  standard  deviation,  S,  is 
also  on  MLE  of  the  population  standard  deviation,  a.  The  invariance  of  the  MLE 
is  important  for  this  thesis,  since  invariant  estimators  of  the  location  and 
scale  parameters  ore  needed  to  develop  critical  value  tobies  when  F(X),  the 
hypothesized  distribution,  is  not  fully  specified. 

Applied  to  the  Cauchy  distribution,  the  likelihood  function  is  (17:404): 


L(  X,,...,  xn|  0,  X)  =  ft{irXli  ♦  (Xj  -  G/X)2]}’1  (16) 

and  the  maximum  likelihood  equations  are: 


{ Uxt  -  0)  A"']  [  1  ♦  (x,  -  ip  x"2r')  =  0  (17) 

1 1  ♦  ( Xj  -  8P  X"2  r'  =  x  n  (18) 

where  0  and  X  are  the  MLE  for  8  and  X,  respectively.  These  equations  are  then 


solved  for  §  and  X. 


The  MLE  is  not  the  only  estimation  technique  that  could  have  been  used 


for  this  thesis.  Another  popular  estimator  is  the  BLUE,  or  best  linear 
unbiased  estimator.  However,  a  study  by  Haas,  Bain,  and  Antle  (17)  concluded 
that  the  MLE  is  a  better  estimator  for  the  Cauchy  distribution,since  they 
found  the  confidence  intervals  developed  for  the  parameters  were  narrower 
with  the  MLE  than  with  the  BLUE. 


This  chapter  looks  at  the  methodology  used  to  complete  this  thesis.  The 
specific  steps  used  in  the  Monte  Carlo  method  to  generate  the  critical  value 
tables  will  be  looked  at  first,  followed  by  a  discussion  of  the  steps  used  to  do 


the  power  study. 

Generating  the  Critical  Value  Tables 

This  thesis  used  the  Monte  Carlo  method  to  generate  critical  value  tobies 
for  the  Cauchy  distribution.  This  method  is  a  way  to  investigate  the  behavior 
of  probobolistic  processes.  It  takes  random  numbers,  chosen  so  that  they 
simulate  the  properties  of  the  process  being  investigated,  and  observes  their 
behavior,  from  which  conclusions  can  be  drawn  about  that  process  (18:2-4). 

Fig.  2  is  a  flow  chart  showing  the  logic  for  generating  the  critical  value 
tables  (6:13-14).  The  following  discussion  will  elaborate  on  those  steps: 

Step  1:  Random  Deviate  Generation.  To  start  the  Monte  Carlo  analysis, 
random  Cauchy  deviates  need  to  be  attained.  A  commercially  available 
computer  subroutine,  GGCAY,  was  used  to  generate  those  deviates,  it  is  part 


of  the  International  Mathematical  and  Statistic  Library  (IMSL)  (20:Chapt  G). 
Step  2:  Ordering  the  Random  Deviates.  Another  IMSL  subroutine,  VSRTA, 


Fig.  2  Critical  Value  Flowchart 


was  used  to  order  the  deviates. 

Step  3:  Estimating  the  Parameters.  As  mentioned  in  Chapter  ill,  the  MLE 
was  used  to  estimate  the  location  and  scale  parameters.  The  actual  computer 
program  for  the  MLE  was  derived  from  a  program  included  in  a  text  by  D.  F. 
Andrews  and  P.  J.  Bickel  (4: 1 7). 

Step  4:  Generate  the  Hypothesized  Distribution  Function  F(x).  With  the 
estimated  location  and  scale  parameters  from  Step  3  and  the  ordered  deviates 
from  Step  2,  equation  (12)  yields  the  hypothesized  CDF. 

Step  5:  Calculate  the  Modified  KS,  AD,  and  CVM  test  statistics. 

Equations  (6),  (8),  and  (10)  are  solved  using  the  hypothesized  CDF  ond  the 
ordered  random  deviates. 

Step  6:  Repeat  5000  times.  Steps  1  -  5  will  be  repeated  5000  times  to 
generate  5000  independent  KS,  AD,  and  CVM  statistics. 

Step  7:  Determine  the  Critical  Values.  A  bit  of  background  is  important 
here  to  understand  this  step.  Critical  values  are  important  in  hypothesis 

testing  since  these  values  are  what  will  be  checked  to  verify  fy.  The  whole 
purpose  of  this  thesis  is  to  generate  those  critlcol  value  tables  to  use  when 

Ho  states  that  the  actual  distribution  of  the  data  is  Cauchy.  Since  all  the 

values  derived  up  until  now  are  based  on  Cauchy  random  deviates,  that  Hq  Is 


true  for  our  samples. 


Steps  1  -  6  hove  generated  5000  order  statistics  for  each  of  the  GOF 
tests.  Combined  with  commonly  used  a  levels,  where  a  is  the  maximum 

probability  of  rejecting  a  true  Hq,  all  that  needs  to  be  determined  is  the  point 
where,  in  the  range  of  the  order  statistics,  each  of  the  a  levels  foil.  The 
mirror  image  of  this  is  to  work  with  the  percentiles,  or  the  t  -  a  levels. 

These  then  become  the  minimum  probability  of  accepting  when  true.  The 
points  where  those  levels  fall  are  the  critical  values. 

To  get  the  criticol  volues  different  techniques  ore  available.  A 
straightforward  technique  is  to  select  that  order  statistic  which,  os  a 
percentage  of  the  total  statistics,  matches  the  percentile  level,  e.g.,  for  the 
80th  percentile  and  5000  order  statistics,  the  critical  value  would  be  the 
4000th  one.  This  was  the  technique  used  by  Green  ond  Hegazy  ( 16).  Recently, 
o  more  precise  technique  has  been  developed,  that  of  plotting  positions  (29:7). 

Plotting  positions  depend  on  the  bootstrap  method  (13).  The  technique 
Involves  locoting  the  discrete  order  statistics  on  o  continuous  spectrum. 

This  is  accomplished  by  taking  the  space  between  the  statistics  and 
representing  it  as  a  piecewise  linear  function.  With  that  function,  it  is 
possible  to  interpolate  between  the  discrete  values  of  the  statistics  and  get 
more  accurate  critical  values  (29:7).  The  interpolation  is  done  by  plotting  the 
order  statistics  against  a  plotting  position  which  represents  the  order 
stetlstics  on  a  zero  to  one  scole. 


There  ere  meny  different  plotting  positions,  end  prior  theses  hove  looked 
et  them  end  did  not  find  eny  signlflcent  difference  between  them  when  It 
comes  to  colculotlng  the  EOF  stetlstlcs  (6;29;38).  Horter  (19)  recently  did  on 
extensive  enelysis  of  venous  of  the  plotting  positions.  One  of  his  findings 
wes  thet  es  somples  Increosed  over  e  somple  size  of  20,  the  differences 
between  them  for  the  positions  they  d  termined  were  insignificent.  With 
5000  independent  volues  for  eech  test  stetistic,  one  plotting  position  for  this 
thesis  is  justified. 

This  thesis  will  use  the  median  rank  plotting  convention.  Horter  shows 
this  to  be  closely  approximated  by  (19:1617): 

V,  =  (1-0.3)/(n-0.4)  (19) 

where 

1  =  1,...,n 
n  s  5000 

Ream  (29:1 1-23)  gives  an  in-depth  illustration  on  how  plotting  positions 
are  used  to  determine  critical  volues,  therefore  only  a  brief  overview  will  be 

done  here.  The  order  statistics  X^,  are  plotted  along  the 

abscissa  axis,  while  the  5000  plotting  position  are  plotted  along  the  ordinate 
axis.  Both  sets  of  points  are  assigned  to  positions  2  to  5001.  For  the 
plotting  positions,  the  intervol  10,1]  is  completed  by  setting  the  first  position 


to  be  Y#  =  0,  and  the  5002nd  position  to  be  Y^,  =  1.  For  the  order  statistics, 

linear  extrapolation  is  used  to  determine  the  first  and  last  entries.  The  first 
entry  is  mode  by  linearly  extrapolating  from  the  first  and  second  order 
statistics,  limited  by  a  nonnegativity  restriction.  The  last  entry  is  similarly 
extrapolated  from  the  lost  and  next  to  last  order  statistics.  For  the  purposes 
of  the  computer  program  used  to  generate  these  values,  an  array  of  5002 
values  was  used  for  each  axis. 

The  extrapolation  of  XqoqO  and  *(o) uses  V  =  mx  +  b*  the  linear 
slope-intercept  formula.  The  first  endpoint  is  calculated  as  follows: 


(20) 


*  =  Yi-"»X(  t) 


(20 


X(0)  =  “  b/m 


(22) 


Since  a  negative  value  is  not  allowed,  the  minimum  value  for  is  0  leading 


Similarly,  the  value  for  X^i) can  be  f°und. 

If  straight  lines  are  used  between  all  the  5002  points,  a  piecewise  linear 
function  is  produced.  At  this  point  linear  interpolation  can  be  used  to 
determine  any  value  that  might  fall  between  any  two  consecutive  points, 
necessary  in  order  to  calculate  the  critical  values.  For  example,  to  find  the 

85th  percentile,  the  largest  plotting  position,  Yt  Jess  than  .85  is  found. 

Then  the  corresponding  X^,  along  with  X^j  and  VH1  are  used  to  linearly 
interpolate  the  critical  value  using: 


m  =  (VH1-  Vt  )/(Vi)“V  (24) 

b  =  V  m  X(0  (25) 

Critical  Value  =  (p  -  b)/m  (26) 

The  critical  value  percentiles  used  for  this  thesis  were  80, 85, 90, 95, 99. 

Step  8:  Repeat  steps  1-7  for  each  of  somple  sizes  5,10,15,20,25,  end  30. 
These  sample  sizes  have  been  used  in  developing  critical  value  tables  for 
other  distributions  (6;27). 


The  resulting  critical  value  tables  are  in  Appendix  A. 


The  Power  Study 


Once  the  critical  value  tables  are  generated,  this  thesis  then  compares 
the  power  of  the  three  test  statistics  against  several  alternate  distributions. 
As  mentioned  previously,  the  concept  of  power  is  important  when  using  EOF 
statistics,  since  the  intent  with  the  hypothesis  testing  is  to  accept  the  null 
hypothesis.  At  the  same  time,  one  wants  to  feel  confident  that  the  alternate 
hypothesis  is  false.  By  having  a  comparison  of  the  power  of  the  three 
statistics,  someone  testing  for  the  Cauchy  distribution  can  select  the  test 
which  best  protects  ogainst  likely  alternate  distributions. 

The  alternate  distributions  used  for  this  thesis  ere  the  Weibull,  with 
shape  parameter  of  3.5,  the  Gamma,  with  shape  parameter  of  2.0,  the  Beta, 
with  the  P  and  Q  parameters  of  2  and  3,  respectively,  the  Exponential,  with 
the  shape  parameter  of  2.0,  the  Normal,  and  the  Double  Exponential,  with  the 
shape  parameter  of  2.0. 

The  logic  of  the  power  study  bosically  follows  thot  of  the  crlticol  value 
table  generation,  except  thot  instead  of  starting  with  Cauchy  random 
deviates,  deviotes  from  the  above  named  distributions  are  used.  Since  the 
program  to  accomplish  the  power  study  Is  simpler  and  less  time  consuming 
when  run,  the  number  of  statistics  calculated  for  each  distribution  was  set  at 
10,000  instead  of  the  5,000  used  for  the  critical  value  tables. 


The  first  step  in  the  power  study  involved  generating  random  deviotes 


for  the  oltemote  distributions  using  IMSL  subroutines.  Then,  since  the  null 
hypothesis  is  that  the  underlying  distribution  is  Cauchy,  steps  2,3,4,  and  5  of 
Fig.2  were  performed.  This  involved  ordering  the  data,  estimating  the 
parameters,  computing  the  hypothesized  F(x),  and  calculating  the  test 
statistics.  These  statistics  are  then  compared  with  the  critical  values 
generated  for  each  respective  test  for  alphas  of  .05  and  .01.  A  counter  is 
incremented  each  time  the  calculated  test  statistic  exceeds  the  critical 
value.  This  tracks  how  many  times  the  null  hypothesis  is  correctly  rejected. 
Then,  the  total  number  of  rejections  is  divided  by  10,000  to  obtain  the  power. 
This  Is  repeated  for  each  of  the  alternate  distributions,  and  finally  for  each 
of  the  different  sample  sizes  (5, 15, 25, 50).  The  resulting  power  comparison 
tobies  are  in  Appendix  B. 


V.  Use  of  the  Tobies 


Chapter  Overview 

This  chapter  will  discuss  the  basic  procedure  involved  in  using  the 
critical  value  tables  generated  in  this  thesis. 

Use  of  the  Tables 

The  critical  value  tables  will  be  used  to  determine  whether  or  not  to 
accept  the  null  hypothesis,  that  the  distribution  of  the  sample  data  points, 

F„(x),  is  the  Cauchy  distribution,  F(x).  The  appropriate  statistic  is  calculated 
using  equation  (6),  (8),  or  (10).  The  calculated  statistic  is  compared  to  the 
critical  value  in  the  tables  (for  a  given  n  and  cc)  and  if  the  statistic  value  is 
greater  than  the  critical  value,  the  null  hypothesis  is  rejected. 

The  following  steps  are  used  in  the  above  analysis  (6:28*29): 

1.  The  user  will  select  the  appropriate  a-level  and  sample  size.  As 
stoted  earlier,  a  sets  the  maximum  probability  of  rejecting  the  null 
hypothesis  when  it  is  true. 

2.  Select  n  random  observations  from  the  total  population.  Order 
these  observations  from  the  smallest  to  the  largest. 

3.  Estimate  the  location  and  scale  parameters  for  the  sample.  The 


estimator  must  be  invariant  for  the  results  to  be  meaningful. 

4.  Specify  the  Cauchy  distribution  using  the  above  estimated  location 
and  scale  parameter. 

5.  Calculate  the  test  statistic  of  interest  --  KS,  CVM  or  AD.  This  can 
be  done  using  Eq  (6),  (8),  or  (10),  respectively. 

6.  Given  the  n  and  a,  look  up  the  critical  value  from  the  tables  in 
Appendix  A. 

7.  If  the  test  statistic  value  is  greater  than  the  critical  value,  then 
the  null  hypothesis  is  rejected.  However,  if  the  critical  value  is  less  than  or 
equal  to  the  test  statistic  value,  then  there  is  a  failure  to  reject  the  null 
hypothesis.  The  conclusion  is  that  there  is  insufficient  evidence  to  reject  the 
null  hypothesis. 


VI.  Results 


Chapter  Overview 

This  chapter  discusses  the  results  of  this  thesis  --  the  critical  value 
tables  and  the  power  tables. 

Critical  Value  Tables 

The  critical  value  tables  for  the  modified  KS,  CVM,  and  AD  tests  are  in 
Appendix  A.  They  are  organized  by  alpha  level  (.20,  .15,  .10,  .05,  .01)  and 
sample  size  (5,10,15,20,25,30,50). 

The  KS  critical  values  all  decrease  as  n  increases  and  a  increases.  The 
rate  of  decrease  slows  down  with  increasing  n  and  a.  It  could  be  that  if  n 
were  increased  to  40  or  50  the  critical  values  would  stabilize  at  some  value. 
The  CVM  and  the  AD  critical  values  also  decrease,  but  only  os  the  a-level 
increases,  while  holding  n  constant.  With  a  constant,  there  is  little  change 
with  changing  n,  in  fact,  the  statistics  stay  very  close  together,  with  slight 
fluctuation. 

Since  the  critical  values  ore  generated  through  a  Monte  Carlo  process, 
there  Is  a  degree  of  variability  introduced.  The  error  of  a  Monte  Carlo  process 
is  proportional  to  1/(N)H ,  with  N  being  the  number  of  iterations  of  the 


simulation  (6:33).  Therefore,  by  running  the  simulation  with  N  =  10000  rather 
than  5000,  some  of  the  patterns  seen  could  change.  However,  due  to  the 
greatly  Increased  computer  time  needed  to  go  from  5000  to  10000,  that 
option  was  not  possible  for  this  thesis. 

Power  Tables 

The  first  'alternate'  distribution  used  in  the  power  study  was  the 
Cauchy.  This  was  done  to  validate  the  values  generated  in  the  first  part  of 
the  thesis.  To  be  valid,  the  power  would  have  to  be  close  to  the  a-levels,  and 
that  is  the  case.  The  powers  are  not  exactly  equal  to  the  oc-levels,  but  that  is 
a  result  of  the  variability  in  the  Monte  Carlo  process,  as  mentioned  above. 

Among  the  three  tests,  the  KS  is  consistently  more  powerful,  across  all 
n  and  a.  There  are  only  3  or  4  instances  where  it  comes  in  second  and  then 
only  in  the  third  significant  digit.  This  could  be  a  result  of  the  KS  being 
fairly  insensitive  to  discrepancies  in  the  toils.  The  Cauchy  has  longer  and 
flatter  toils  and  the  KS  might  be  deemphosizing  the  difference  there. 

Analyzing  by  sample  size  shows  very  poor  power  at  n  =  5  (.085  being  the 
highest),  with  power  increasing  os  n  Increases.  When  one  gets  to  a  sample 
size  of  50,  three  distributions,  the  Exponential,  the  Gammo,  and  the  Beta, 
hove  powrs  above  0.5  (1.0,  .947,  .586  respectively).  This  makes  sense  as  the 
amount  of  information  available  increases  with  sample  size. 

When  a-levels  ore  analyzed,  for  a  =  .01,  the  only  reasonable  power  is 


against  the  Exponential  and  the  Gamma,  with  a  power  of  .991  and  .719 
respectively.  The  next  best  is  only  .176.  Asa  Increases  to  .05,  power 
Increases  across  the  board,  getting  up  to  1.0  for  the  Exponential.  When 
looking  at  the  alternate  distributions,  the  distributions  with  reasonable 
power  are  the  Exponential,  the  Gamma,  and  the  Beta.  The  highest  power 
among  all  the  other  distributions  is  .259,  not  enough  to  instill  any  confidence 
in  the  results  of  the  GOF  test. 

Given  the  above  analysis,  if  the  Cauchy  is  the  distribution  in  the  null 
hypothesis,  one  should  try  for  a  sample  size  of  50  or  better,  and  if  that  is  not 
possible,  accept  on  a  of  .05  or  greater. 


Chapter  Overview 


This  chapter  gives  the  conclusions  reached  in  this  thesis,  and  the 
recommendations  made  for  further  study. 

Conclusions 

1.  The  critical  value  tables  generated  for  the  Kolmogorov-Smimov,  the 
Cramer-von  Mises,  and  the  Anderson-Darling  goodness-of-fit  tests  for  the 
Cauchy  distribution  ore  valid.  By  using  the  Cauchy  os  one  of  the  'alternate’ 
distributions  when  doing  the  power  study,  the  values  were  validated. 

2.  Regarding  the  choice  of  a  test,  if  the  alternate  hypothesis  is  the 
Exponential,  the  Gamma,  or  the  Beta,  and  the  sample  size  is  greater  than  five, 
then  oil  three  of  the  tests  are  fairly  powerful.  However,  the  Kolmogorov- 
Smimov  test  is  the  most  powerful  of  the  three  in  ony  situation. 

3.  After  analyzing  the  power  test,  there  is  o  good  deal  of  power 
available  against  the  Exponential,  the  Gamma,  and  the  Beta. 


Sfi£Q 


lllllWU' 


Qtions 


1.  If  possible,  the  critical  value  tobies  should  be  redone  with  10,000 
statistics.  This  would  reduce  the  variability  due  to  the  Monte  Carlo  process. 


end  would  make  one  more  certain  of  the  patterns  evident  in  the  tables. 

2.  With  the  improvement  in  power  evidenced  in  the  power  tables,  further 
power  studies  should  be  attempted  with  larger  sample  sizes  and  cc-levels. 

The  goal  should  be  to  find  what  combinations  would  increase  the  power 
against  the  weaker  distributions  (Weibull,  Normal,  and  Double  Exponential). 

3.  Other  distributions  should  be  investigated  in  further  power  studies  to 
determine  if  there  are  other  distributions  against  which  the  60F  tests  are 


of  so  powerful. 


APPENDIX  A 

This  includes  critical  value  tables  for  the  Kolrnogorov’-Shiirnov ,  the 
Cramer-von  Mises,  and  the  Anderson-Darling  Tests. 


TABLE  I 


CRITICAL 

VALUES 

FCR  THE  MODIFIED  KS  TEST 

ALPHA 

N 

CRITICAL  VALUE 

.20 

5 

0.2898729 

.20 

10 

0.2089296 

.20 

15 

0.1745039 

.20 

20 

0.1542405 

.20 

25 

0.1385744 

.20 

30 

0.1271362 

.20 

50 

p.C 989140 

.15 

5 

0.3054360 

.15 

10 

0.2196148 

.15 

15 

0.1839646 

.15 

20 

0.1619531 

.15 

25 

0.1453200 

.15 

30 

0.1338C64 

•  15 

50 

0.1035580 

•  10 

5 

0.3252105 

•  10 

10 

0.2335S90 

•  10 

15 

0.1960747 

.10 

20 

0.1727150 

.10 

25 

0.1548475 

.10 

30 

0.1435490 

.10 

50 

0.1099358 

-.05 

5 

0.3480C30 

.05 

10 

0.2544265 

•  05 

15 

0.2142230 

.05 

20 

0.1 87 8866 

•  05 

25 

0.1698837 

.05 

30 

0.1564334 

.05 

50 

0.1200329 

.01 

5 

0.3840281 

.01 

10 

0.2967503 

.01 

15 

2463341 

.01 

20 

0.2202671 

•  01 

25 

0.2011247 

•  01 

30 

0.1826919 

.01 

50 

0.1413185 

TABLE  II 


CRITICAL 

VALUES  FCR 

THE  MCCIFIEO  CVM  TEST 

ALPHA 

N 

CRITICAL  VALUE 

.20 

C 

0.CS71847 

•  20 

10 

0.  CS10C45 

.20 

15 

0.C914S36 

.20 

20 

0.0530557 

.20 

25 

0.0918603 

.20 

30 

0.0925299 

•  20 

50 

0.C911552 

.15 

5 

*  0.1148650 

.15 

10 

0.1064403 

.15 

15 

0.1068219 

.15 

20 

•  0.1087681 

.15 

25 

0.1075684 

.15 

30 

•0.1090E52 

.15 

50 

0.1055738 

•  10 

5 

0.1364753 

.10 

10 

0.1290915 

•  10 

IS 

0.1262527 

•  10 

20 

0.1290561 

•  10 

25 

0.1304703 

i.10 

30 

0.1311479 

.10 

50 

0.1266534 

.05 

5 

0.1668584 

.05 

10 

0.1643178 

•  05 

15 

0.1663567 

.05 

20 

0.1694394 

.05 

25 

0.1711696 

.05 

.30 

0.1705763 

.05 

50 

0-1629795 

•  01 

c 

•# 

0.2162196 

•  01 

10 

0.2393C59 

.01 

15 

0.2558281 

•  01 

20 

0.2500614 

•  01 

25 

0.2658464 

•  01 

30 

0.2640329 

.01 

50 

0.2547385 

TABLE  III 


CRITICAL 

VALLES  PC  R 

TKE  MODIFIED  AC  TEST 

ALPHA 

N 

CRITICAL  VALUE 

•  20 

5 

0.7511249 

.20 

10 

0.70C6519 

.20 

15 

0.7101663 

.20 

20 

0.7105355 

•  20 

25 

0.6993738 

.20 

30 

0.7045876 

..0 

50 

0.6554132 

.15 

5 

,0.8039264 

.15 

10 

*  0.8087849 

•  15 

15 

0.8106228 

.15 

20 

0.8156878 

•  15 

25 

0.7557106 

.15 

30 

0.8138550 

.15 

50 

0.7880611 

.10 

5 

1. '459634 

•  10 

10 

0.5686199 

•  10 

15 

0.5728056 

•  10 

20 

0.5603571 

•  10 

25 

0.5719902 

•  10 

30 

0.5705017 

.10 

50 

0.S372754 

.05 

5 

1.3589739 

.05 

10 

1.2246506 

.05 

15 

1.2866855 

•  05 

2C 

1.2586166 

.05 

25 

1.2332463 

.05 

30 

1.2496C85 

•  05 

50 

1.1945103 

.01 

5 

2.1669748 

•  01 

10 

1.8335395 

.01 

15 

'..9464778 

.01 

20 

1.8302892 

.01 

25 

1.9715600 

.01 

30 

1.5268377 

.01 

50 

1.8649824 

40 


0.C51 
52 
45 


0.C32 
5 


0  •  C4  7 


0.178 
9 

0.CS5 


BETA 

P2C3 


0.042 


0.C84 

0.054 

0.025 


EXPCN 

SH=2. 


DBLEXP 
SH  =  2. 


23 
54 
0.259 


0.C39 


0.C26 


0.C50 

0.C29 


0.C6Q 


1.  Amstadter,  Bertram  L.  Reliability  Mathematics.  New  York:  McGraw-Hill, 
1971. 

2.  Anderson,  T.W.  end  D.A.  Darling.  "Asymptotic  Theory  of  Goodness  of  Fit 
Criteria  Based  on  Stochastic  Processes,"  Annals  of  Mathematical  Statistics. 
22:  193-212(1952). 

3.  Anderson,  T.W.  and  D.A.  Darling.  “A  Test  of  Goodness  of  Fit,"  Journal  of 
ibfi  American  Statistical  Association.  42: 765-769  (Dec.  1 954). 

4.  Andrews,  D.F.  and  PJ.  Blckel.  Robust  Estimates  of  Location.  New  Jersey: 
Princeton  University  Press,  1972. 

5.  Banks,  Jerry  and  John  S.  Carson.  Discrete-Event  System  Simulation. 
Englewood  Cliffs:  Prentice-Hall,  1984. 

6.  Bush.  John  G.  A  Modified  Cramer-Von  Mises  and  Anderson-Darling  Test 
for  the  Wei  bull  Distribution  with  Unknown  Location  and  Scale  Parameters.  MS 
Thesis,  G0R/ma/81D.  School  of  Engineering,  Air  Force  institute  of  Technology 
(AU),  Wright-Patterson  AFB,  OH,  December  1981. 

7.  Conover,  WJ.  Practical  Nonoarametric  Statistics  (Second  Edition).  New 
York:  John  Wiley  and  Sons,  1980. 

8.  Copas,  J.B.  "On  the  Unimodality  of  the  Likelihood  for  the  Cauchy 
Distribution,"  Biometrika.  £2:  70 1  -704  ( 1 975). 

9.  Darling,  D.A.  The  Kolmogorov-Smlmov,  Cramer-Von  Mises  Tests,"  The 
Annals  fll  Mathematical  Statistics.  2&  623-838  ( 1 957). 

10.  David,  F.N.  and  N.L.  Johnson.  The  Probability  Integral  Transformation 
When  Parameters  are  Estimated  from  the  Sample."  Biometrika.  35:  182-190 
(1948). 


1 1.  Durbin,  J.  and  M.  Knott.  "Components  of  Cramer-von  Mises  Statistics.  I," 
Journal  of  Roual  Statistical  Society.  34(g):  290-307(1972). 


12.  Easterling,  Robert  6.  "Goodness  of  Fit  end  Parameter  Estimation," 
Technometrics.  18:  1-9(1976). 

13.  Efron,  B.  "Bootstrap  Methods:  Another  Look  at  the  Jackknife,"  The  Annals 
of  Statistics.  7:  1-26(1979). 

14.  Gabrielsen,  Gorm.  ‘On  the  Unimodality  of  the  Likelihood  for  the  Cauchy 
Distribution:  Some  Comments."  Blometrika.  69:  677-678(1982). 

15.  Granger,  Clive  WJ.  and  Daniel  Orr.  ‘Infinite  Variance  and  Research 
Strategy  in  Time  Series  Analysis,"  Journal  of  ihg  American  Statistical 
Association.  67:  275-285(1972). 

16.  Green,  J.R.  and  Y.A.S.  Hegazy.  "Powerful  Modified  EDF  Goodness-of-Fit 

T ests,"  Journal  g£  M  American  Statistical  Association.  21:  204-209  ( 1 976). 

17.  Haos,  Gerald  and  Lee  Bain  and  Charles  Antle.  ‘Inferences  for  the  Cauchy 
Distribution  Based  on  Maximum  Likelihood  Estimators,"  Biometrika.  57(2): 
403-408(1970). 


10.  Hammersley,  J.M.  and  D.C.  Handscomb.  Monte  Carlo  Methods.  London: 
Mehtuen  and  Co.,  1967. 


19.  Harter,  H.L.  ‘Another  Look  at  Plotting  Positions,"  Communications  jn 
Statistics.  A  13(1 3):  1613-1633(1984). 


20.  international  Mathematics  and  Statistics  Library  Reference  Manual  - 
0006.  Houston:  IMSL,  1980. 


21.  Johnson,  Norman  L.  and  Samuel  Kotz.  Continuous  Univariate  Distributions 
-J,  Boston:  Houghton  Mifflin,  1970. 


22.  Jonson,  Edward  C.  Conditional  Nearly  Best  Linear  Estimation  of  the 
Location  end  Scale  Parameters  of  the  Cauchy  Distribution  bu  the  Use  of 
Censored  Order  Statistics.  MS  Thesis.  GRE/MATH/69-4.  School  of 
Engineering,  Air  Force  Institute  of  Technology  (AU),  Wright-Patterson  AFB, 
OH,  December  1969. 


23.  Kendall,  Maurice  G.  and  Alan  Stuart.  The  Advanced  T 
Volume  2.  New  York:  Hofner  Publishing,  1961. 


of  Statistics: 


24.  Knight,  Frank  B.  ‘A  Characterization  of  the  Cauchy  Type,"  Proceedings  fit 
the  American  Mathematical  Society.  55:  130-135(1976). 


25.  Lilliefors,  Hubert  W.  “On  the  Kolmogorov-Smimov  Test  for  Normality 
with  Mean  and  Variance  Unknown,"  Journal  of  the  American  Statistical 
Association.  £2:  399-402  ( 1 967). 

26.  Lilliefors,  Hubert  W.  ‘On  the  Kolmogorov-Smimov  Test  for  the 
Exponential  Distribution  with  Mean  Unknown.'  Journal  of  the  American 
Statistical  Association.  64:  387-389  ( 1 969). 

27.  Mann,  Nancy  R.  and  Ernest  M.  Scheuer  and  Kenneth  W.  Fertig.  "A  New 
Goodness-of-Fit  Test  for  the  Two-Parameter  Weibull  or  Extreme  Value 
Distribution  with  Unknown  Parameters.'  Communications  in  Statistics.  2.(5); 
383-400  (1973). 

28.  Massey,  Frank  J.  Jr.  The  Kolmogorov-Smimov  Test  for  Goodness  of  Fit,* 
Journal  of  the  American  Statistical  Association.  46:  68-78(1951). 

29.  Ream,  Thomas  J.  A  New  Goodness  of  Fit  Test  for  Normalitu  With  Mean 
and  Variance  Unknown.  MS  Thesis,  G0R/MA/81D.  School  of  Engineering,  Air 
Force  Institute  of  Technology  (AU),  Wright-Patterson  AFB,  OH,  December 
1981. 

30.  Rider,  Paul  R.  ‘Distributions  of  Product  ond  Quotient  of  Cauchy 
Variables,'  American  Mathematical  Monthly.  JZ  303-305  ( 1 965). 

31.  Schuster,  Eugene  F.  ‘On  the  Goodness-of-Fit  Problem  for  Continuous 
Symmetric  Distributions,'  Journal  fli  ih§  American  Statistical  Association. 
fifi:  713-715(1973). 

32.  Spltzer,  F.  ‘Some  Theorems  Concerning  2  •Dimensional  Brownian  Motion,* 
Transactions  of  the  American  Mathematical  Society.  87:  187-197(1958). 

33.  Stephens,  M.A.  *EDF  Statistics  for  Goodness  of  Fit  and  Some 
Comparisons,*  Journal  of  the  American  Statistical  Association.  69:  730-737 
(1974). 


34.  Stephens,  M.A.  ‘Asymptotic  Results  for  Goodness-of-Fit  Statistics  With 
Unknown  Parameters,*  Annals  fll  Statistics.  4  357-369  ( 1 976). 

35.  Stephens,  M.A.  The  Anderson-Darling  Statistic.  Grant 
DAAG29-77-G-003 1 .  U.S.  Army  Research  Office,  Stanford  University, 
Stanford,  CA,  October  1979  (AD-A079  807). 


36.  Viviano,  Philip  J.  A  Modified  Kolmoporov-Smimov.  Anderson-Darling,  and 
Cramer-von  Mises  Test  for  the  Gamma  Distribution  with  Unknown  Location 
and  Scale  Parameters.  MS  Thesis,  G0R/MA/82D.  School  of  Engineering,  Air 
Force  Institute  of  Technology  (AU),  Wright  Patterson  AFB,  OH,  December  1982. 

37.  Winkler,  Robert  L.  and  William  L.  Hays.  Statistics:  Probability. 

Inference,  and  Decision  (Second  Edition).  New  York:  Holt,  Rinehart,  and 
Winston,  1975. 

38.  Woodbury,  Larry  B.  A  New  Goodness  gf  Fit  Test  for  the  Uniform 
Distribution  With  Unspecified  Parameters  MS  Thesis,  60R/MA/82D.  School 
of  Engineering,  Air  Force  Institute  of  Technology  (AU),  Wright  Patterson  AFB, 
OH,  December  1982. 

39.  Yoder,  John  D.  Modified  Kolmooorov-Smimov.  Anderson-Darling  and 
Cramer-von  Mises  Tests  for  the  Logistic  Distribution  with  Unknown  Location 
and  Scale  Parameters.  MS  Thesis,  G0R/ENC/83D.  School  of  Engineering,  Air 
Force  Institute  of  Technology  (AU),  Wright  Patterson  AFB,  OH,  December  1983. 


VITA 


Captain  Frank  Ocasio  was  bom  on  16  August  1953  in  New  Vork  City,  New 
York.  He  graduated  from  high  school  in  New  York  City  in  1971  and  attended 
Rensselaer  Polytechnic  Institute  from  which  he  received  the  degree  of 
Bachelor  of  Science  in  Management  in  August  1974.  He  was  then  employed  by 
Carroll's  Development  Corporation  as  a  restaurant  manager  until  he  entered 
Officer  Training  School  in  February  1976.  He  was  commissioned  a  second 
lieutenant  in  May  1978.  His  first  assignment  was  to  the  756th  Radar 
Sauadron  at  Finland  Air  Force  Station,  Minnesota  as  a  Logistics  Support 
Officer.  In  May  1980  he  was  assigned  to  the  3515th  USAF  Recruiting 
Squadron  os  on  QTS  Officer.  In  November  1981  he  was  assigned  to  DCS 
Recruiting  Service,  HQ  Air  Training  Command,  Randolph  AFB,  Texas  where  he 
served  as  a  staff  officer.  While  at  Randolph,  he  attended  St.  Mary's  University 
during  the  evenings  until  June  1983  when  he  received  the  degree  of  Master  of 
Science  in  Systems  Engineering.  He  entered  the  School  of  Engineering,  Air 
Force  Institute  of  Technology,  in  May  1984.. 

Permanent  address:  7667  Torraqp  Drive 

Son  Antonio,  Texas  78239 


L‘.  v  V  L 


47 


UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  of  this  page 


PT^^^rTsECURITYC  L  ASS  I  F I C  A  T I  o  N 

■  UNCLASSIFIED _ 

I  2*  SECURITY  CLASSIFICATION  AUTHORITY 


REPORT  DOCUMENTATION  PAGE 

lib.  RESTRICTIVE  MAR*  NGS 


2b.  OECLASSlFICATION/OOliVNGRAOING  SCHEDULE 
«,  PERFORMING  ORGANIZATION  REPORT  NUMBERISI 


3.  DISTRIBUTION/AVAILABILITY  of  report 

Approved  for  public  release;  distribution 
unlimited 

5.  MONITORING  ORGANIZATION  REPORT  NUMBERISI 


AFIT/GSO/MA/85D-5 

6»  NAME  OF  PERFORMING  ORGANIZATION  Sb.  OFFICE  SYMBOL  7a.  NAME  OF  MONITORING  ORGANIZATION 

School  of  Engineering  nr  applicable) 

AF  Inst  of  Technology _ AFIT/ENC _ _ 

6c.  ADDRESS  (City,  State  and  ZIP  Code)  7b.  ADDRESS  (City,  State  and  ZIP  Code) 

Wright-Patterson  AFB  OH  45433 


18a  NAME  OF  FUNDING/SPONSORING 
ORGANIZATION 


1 8c.  ADDRESS  (City,  State  and  ZIP  Code ) 


8b.  OFFICE  SYMBOL  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 
(If  applicable I 


10.  SOURCE  OF  FUNDING  NOS. 

PROGRAM  PROJEC' 

ELEMENT  NO.  NO. 


11.  TITLE  (Include  Security  Classification j 

See  Box  19 _ 

12.  PERSONAL  AUTHOR(S) 

Frank  Ocasio.  Capt.  USAF _ 

k  #  3a.  TYPE  OF  REPORT  13b.  TIME  COVERED 

IMS  Thesis _  .  TQ. 

16.  SUPPLEMENTARY  NOTATION 


PROJECT 

TASK 

NO. 

NO. 

WORK  UNIT 
NO. 


14.  DATE  OF  REPORT  (Yr..  Mo.,  Day ) 

1985  December _ 


15.  PAGE  COUNT 


COSATI  CODES 


FIELD  GROUP 

12  01 


18.  SUBJECT  TERMS  /Continue  on  reverse  if  necessary  and  identify  by  block  number) 

Nonparametric  Statistics,  Statistical  Tests,  Statistical 
Distributions,  Distribution  Functions 


1 19.  ABSTRACT  (Continue  on  reverse  if  necessary  and  identify  by  biock  number ) 


Title:  A  MODIFIED  KOLMOGCKOV-SMIRNOV,  ANDERSON-DARLING,  AND  CRAMER-VCN  MISES  TEST 
FOR  THE  CAUCHY  DISTRIBUTION  WITH  UNKNOWN  LOCATION  AND  SCALE  PARAMETERS 


Thesis  Advisor:  Dr.  Albert  H.  Moore,  Professor 


tapioTad  loi  BpbUo  ralaoMI  ttW  tfl  m^f. 

fUe  /4  0*0  rl 

tVjTT  WOLAVEB  . 

Dado  lot  Roaomcb  and  IHglaaBWlBJ*^" 
jy,  fona  tootttula  ol  1  Ktiaotod  WWl 
Wi«gM-MM»oa  AFB  OB  «M» 


|20  OISTRI  BUTION/A  V Al LABI  LIT Y  OF  ABSTRACT 

V'.-nclassifieo/unlimiteo  IK!  same  as  rpt  □  otic  users  □ 


23a.  NAME  OF  RESPONSIBLE  INDIVIDUAL 


1 21 .  ABSTRACT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


22b.  TELEPHONE  NUMBER  |22c.  OFFICE  SYMBOL 

(Include  Area  Code) 


Dr.  Albert  H.  Moore,  Professor 


00  FORM  1473,  83  APR 


513)  255-3098 


EOITION  OF  1  JAN  73  IS  OBSOLETE. 


AFIT/ENC 


UNCLASSIFIED _ 

SECURITY  CLASSIFICATION  OF  THIS  PAGE 


ABSTRACT 


The  Kolmogorov-Smirnov ,  Anderson-Darling,  and  Cramer-von  Mises 
critical  values  are  generated  for  the  Cauchy  distribution.  The 
critical  values  are  used  for  testing  the  null  hypothesis  that  a  set 
of  observations  follow  a  Cauchy  distribution  when  the  location  and 
scale  parameters  are  unknown  and  estimated  from  the  sample.  A  Monte 
Carlo  simulation,  using  5000  repetitions,  was  used  to  generate  the 
critical  values  for  sample  sizes  of  5(5)30  and  50.  r,^or  ,  -  ss  0  f 

A  power  study  was  performed  using  Monte  Carlo  simulation  for 
the  Kolmogorov-Snirnov ,  Anderson-Darling,  and  Cramer-von  Mises  tests. 
Sample  sizes  of  5,  15,  25  and  50  were  used  for  six  alternate  distribu¬ 
tions,  for  alpha  levels  of  .05  and  .01.  Analyzing  by  sample  size  shows 
very  poor  power  for  a  sample  size  of  five.  As  the  sample  size  increases 


so  does  the  power,  so  that  at  a  sample  size  of  fifty,  the  pavers  against 
three  of  the  six  distributions  is  .5  or  better.  Among  the  three  tests, 
the  Kolmogorov-Snirnov  is  consistently  more  powerful,  regardless  of 


sample  size  or  alpha  level. 


c 


